Exploring the Performance Potential of Itanium® Processors with ILP-based Scheduling
نویسنده
چکیده
HP and Intel’s Itanium Processor Family (IPF) is considered as one of the most challenging processor architectures to generate code for. During global instruction scheduling, the compiler must balance the use of strongly interdependent techniques like code motion, speculation and predication. A too conservative application of these features can lead to empty execution slots, contrary to the EPIC philosophy. But overuse can cause resource shortage which spoils the benefit. We tackle this problem using integer linear programming (ILP), a proven standard optimization method. Our ILP model comprises global, partial-ready code motion with automated generation of compensation code as well as vital IPF features like control / data speculation and predication. The ILP approach can – with some restrictions – resolve the interdependences between these decisions and deliver the global optimum. This promises a speedup for compute-intensive applications as well as some theoretically funded insights into the potential of the architecture. Experiments with several hot functions from the SPEC benchmarks show substantial improvements: Our postpass optimizer reduces the schedule lengths produced by Intel’s compiler by about 20-40%. The resulting speedup of these routines is 16% on average.
منابع مشابه
Optimal Global Instruction Scheduling for the Itanium
On the Itanium 2 processor, effective global instruction scheduling is crucial to high performance. At the same time, it poses a challenge to the compiler: This code generation subtask involves strongly interdependent decisions and complex trade-offs that are difficult to cope with for heuristics. We tackle this NP-complete problem with integer linear programming (ILP), a search-based method th...
متن کاملAn efficient memory operations optimization technique for vector loops on Itanium 2 processors
To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking str...
متن کاملA New ILP Model for Identical Parallel-Machine Scheduling with Family Setup Times Minimizing the Total Weighted Flow Time by a Genetic Algorithm
This paper presents a novel, integer-linear programming (ILP) model for an identical parallel-machine scheduling problem with family setup times that minimizes the total weighted flow time (TWFT). Some researchers have addressed parallel-machine scheduling problems in the literature over the last three decades. However, the existing studies have been limited to the research of independent jobs,...
متن کاملDynamic Profile Driven Code Version Selection
In this paper, we study the effectiveness of dynamic code version selection on Itanium R © 2 processors. Code version selection can improve the effectiveness of optimizations, adapting them to multiple input sets. In this performance potential study, we conduct experiments on dual-core Itanium R © 2 processors and examine the effectiveness of dynamic code version selection of loop scheduling, l...
متن کاملCSE231 project report —- survey on instruction scheduling
This paper surveys past research on instruction scheduling for exploiting more Instruction Level Parallelism (ILP). We focus on static instruction scheduling performed by compiler. The hardware platform for implementing such compiler techniques, i.e. VLIW is also reviewed. We also give comparison between the code scheduling done dynamically by out-of-order machines and that by compilers, along ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004